## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

DF1 - EMPLOYMENT

## Getting data from the ACS 1-year Supplemental Estimates.  Data are available for geographies with populations of 20,000 and greater.
## Loading ACSSE variables for 2021 from table K202301 and caching the dataset for faster future access.

## corrplot 0.92 loaded



DF 2 - Education

## Getting data from the ACS 1-year Supplemental Estimates.  Data are available for geographies with populations of 20,000 and greater.
## Loading ACSSE variables for 2021 from table K201501 and caching the dataset for faster future access.

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) based on percentage

## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(percentage_columns)
## 
##   # Now:
##   data %>% select(all_of(percentage_columns))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.



DF3 - Citizenship

## Getting data from the ACS 1-year Supplemental Estimates.  Data are available for geographies with populations of 20,000 and greater.
## Loading ACSSE variables for 2021 from table K200501 and caching the dataset for faster future access.
## Warning: Removed 1 rows containing missing values (`position_stack()`).

## Warning: Removed 1 rows containing missing values (`position_stack()`).

## Warning: Removed 1 rows containing missing values (`position_stack()`).



DF4 - Age

## Getting data from the ACS 1-year Supplemental Estimates.  Data are available for geographies with populations of 20,000 and greater.
## Loading ACSSE variables for 2021 from table K200104 and caching the dataset for faster future access.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ lubridate 1.9.2     ✔ stringr   1.5.0
## ✔ purrr     1.0.2     ✔ tibble    3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ purrr::map()    masks maps::map()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
## 
## Getting data from the ACS 1-year Supplemental Estimates.  Data are available for geographies with populations of 20,000 and greater.
## 
## Downloading feature geometry from the Census website.  To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## 
## Loading ACSSE variables for 2021 from table K200104 and caching the dataset for faster future access.
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |======================================================================| 100%



DF5 - Housing

## Getting data from the 2021 1-year ACS
## The 1-year ACS provides data for geographies with populations of 65,000 and greater.
## Getting data from the ACS 1-year Supplemental Estimates.  Data are available for geographies with populations of 20,000 and greater.
## Loading ACSSE variables for 2021 from table K202502 and caching the dataset for faster future access.
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor

DF6 - Disabilities

## Getting data from the ACS 1-year Supplemental Estimates.  Data are available for geographies with populations of 20,000 and greater.
## Loading ACSSE variables for 2021 from table K201803 and caching the dataset for faster future access.
## # A tibble: 52 × 10
##    NAME          Total_people Total With Disabilit…¹ Hearing `Vision difficulty`
##    <chr>                <dbl>                  <dbl>   <dbl>               <dbl>
##  1 Alabama            4957633                 808071  208028              152798
##  2 Alaska              702154                  92390   33397               15748
##  3 Arizona            7174053                 972252  298849              180792
##  4 Arkansas           2974701                 517051  142133              105624
##  5 California        38724294                4324355 1140131              844049
##  6 Colorado           5715497                 640346  211803              120570
##  7 Connecticut        3557526                 427014  113490               78078
##  8 Delaware            987964                 130551   37933               25335
##  9 District of …       659979                  76754   14429               14569
## 10 Florida           21465883                2906367  812248              555361
## # ℹ 42 more rows
## # ℹ abbreviated name: ¹​`Total With Disabilities`
## # ℹ 5 more variables: cognative <dbl>, `ambulatory difficulty` <dbl>,
## #   `Self-care difficulty` <dbl>, `Independent living difficulty` <dbl>,
## #   `No Disability` <dbl>

## 
## Call:
## lm(formula = `ambulatory difficulty` ~ Hearing + `Vision difficulty` + 
##     cognative + `Self-care difficulty` + `Independent living difficulty`, 
##     data = df6_wide)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -56621 -11941  -2766  17524  73868 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     -8200.5856  5646.4977  -1.452 0.153197    
## Hearing                             0.6796     0.1793   3.791 0.000436 ***
## `Vision difficulty`                 1.0070     0.1644   6.126 1.87e-07 ***
## cognative                          -0.5426     0.2291  -2.369 0.022116 *  
## `Self-care difficulty`             -1.9464     0.4434  -4.390 6.59e-05 ***
## `Independent living difficulty`     1.9228     0.3144   6.115 1.95e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24940 on 46 degrees of freedom
## Multiple R-squared:  0.9968, Adjusted R-squared:  0.9965 
## F-statistic:  2909 on 5 and 46 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = `ambulatory difficulty` ~ Hearing + `Vision difficulty` + 
##     cognative + `Self-care difficulty` + `Independent living difficulty`, 
##     data = df6_wide)
## 
## Coefficients:
##                     (Intercept)                          Hearing  
##                      -8200.5856                           0.6796  
##             `Vision difficulty`                        cognative  
##                          1.0070                          -0.5426  
##          `Self-care difficulty`  `Independent living difficulty`  
##                         -1.9464                           1.9228



Part 1

Title

“Socio-Economic Factors Influencing Employment in the United States: A Comprehensive State-by-State Analysis”

Authors

  1. Abhay Prasanna Rao

  2. Srika Raja

  3. Neha

  4. Esha

  5. Niharika

Abstract (TL;DR)

This project investigates the impact of socio-economic factors on employment rates across U.S. states. Utilizing ACS 2021 data, we explore relationships between employment and variables like education, citizenship, and housing. Key findings include significant correlations that inform employment dynamics in the U.S.

Motivation

We aim to analyze various socio-economic factors influencing employment in the U.S. This study is crucial for understanding how different aspects like education, age, and housing contribute to employment rates, thereby aiding policymakers and researchers.

Summary

We have imported data set from the ACS survey. We have 6 child RMD files for this project which has the data analysis for the Employment, Education, Citizenship, Age, Housing, Disabilities Data Set (ACS 2021).
Further, we started exploring each data set in detail and then we started combining each data set with the employment to see what results we can expect. We did find many direct relationships with each data set on employment data set. We have put our concluding results in the Final Report to help us stand by with our conclusions.

Part 2:

Data Sets

The below data sets are from data.census.gov [ United States Census Bureau]. We shortlisted it based on ACS 2021, inclusive for all states in United States.

  1. Employment - K202301

    Variable Description
    Total Total Employment Data
    In Labor Force Total People in Labor Force
    Civilian labor force: Total People in Civilian Labor Force
    Employed Total People Employed
    Unemployed Total People Unemployed
    In Armed Forces Total People in Armed Forces
    Not in labor force Total People not in Labor Force
  2. Education - K201501

    Variable Description
    1. Education_Total_students
    Total Students in the Education Survery
    1. Education_Below_9th grade
    Number of students who have completed 9th grade
    1. Education_9th to 12th grade_no diploma
    Number of students who have completed 9th grade to 12th grade but no diploma
    1. Education_High_school_graduate
    1. Education_Some college_no degree
    1. Education_Associates_degree
    1. Education_Bachelors_degree
    1. Education_Graduate_professional degree
  3. Citizenship - K200501

    Variable Description
  4. Age - K200104

    Variable Description
  5. Housing - K202502

    Variable Description
  6. Disabilities - K201803

    Variable Description